15 research outputs found

    Reducing communication in sparse solvers

    Get PDF
    Sparse matrix operations dominate the cost of many scientific applications. In parallel, the performance and scalability of these operations is limited by irregular point-to-point communication. Multiple methods are investigated throughout this dissertation for reducing the cost associated with communication throughout sparse matrix operations. Algorithmic changes reduce communication requirements, but also affect accuracy of the operation, leading to reduced convergence of scientific codes. We investigate a method of systematically removing relatively small non-zeros throughout an algebraic multigrid hierarchy, yielding significant reductions to the cost of sparse matrix-vector multiplication that outweigh affects of reduced accuracy of the multiplication. Therefore, the reduction in per-iteration communication costs outweigh the cost of extra solver iterations. As a result, sparsification yields improvement of both the performance and scalability of algebraic multigrid. Alterations to the parallel implementation of MPI communication also yield reduced costs with no effect on accuracy. We investigate methods of agglomerating messages on-node before injecting into the network, reducing the amount of costly inter-node communication. This node-aware communication yields improvements to both performance and scalability of matrix operations, particularly in strong scaling studies. Furthermore, we show an improvement in the cost of algebraic multigrid as a result of reduced communication costs in sparse matrix operations. Finally, performance models can be used to analyze the costs of matrix operations, indicating the source of dominant communication costs, such as initializing messages or transporting bytes of data. We investigate methods of improving traditional performance models of irregular point-to-point communication through the addition of node-awareness, queue search costs, and network contention penalties

    MPI Advance : Open-Source Message Passing Optimizations

    Full text link
    The large variety of production implementations of the message passing interface (MPI) each provide unique and varying underlying algorithms. Each emerging supercomputer supports one or a small number of system MPI installations, tuned for the given architecture. Performance varies with MPI version, but application programmers are typically unable to achieve optimal performance with local MPI installations and therefore rely on whichever implementation is provided as a system install. This paper presents MPI Advance, a collection of libraries that sit on top of MPI, optimizing the underlying performance of any existing MPI library. The libraries provide optimizations for collectives, neighborhood collectives, partitioned communication, and GPU-aware communication.Comment: Available on conference website : https://eurompi23.github.io/assets/papers/EuroMPI23_paper_33.pd

    Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

    Full text link
    Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular messages using point-to-point communications, and any optimizations are added directly into the application. As a result, these optimizations lack portability. There is no easy way to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for optimizing irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows an up to 1.32x speedup on sparse matrix-vector multiplication within a BoomerAMG solve through the use of our optimized neighbor collectives. The authors analyze multiple implementations of neighborhood collectives, including a standard implementation, which simply wraps standard point-to-point communication, as well as multiple implementations of locality-aware aggregation. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install

    Collective-Optimized FFTs

    Full text link
    This paper measures the impact of the various alltoallv methods. Results are analyzed within Beatnik, a Z-model solver that is bottlenecked by HeFFTe and representative of applications that rely on FFTs

    Targets of wnt/ß-catenin transcription in penile carcinoma

    Get PDF
    Penile squamous cell carcinoma (PeCa) is a rare malignancy and little is known regarding the molecular mechanisms involved in carcinogenesis of PeCa. The Wnt signaling pathway, with the transcription activator ß-catenin as a major transducer, is a key cellular pathway during development and in disease, particularly cancer. We have used PeCa tissue arrays and multi-fluorophore labelled, quantitative, immunohistochemistry to interrogate the expression of WNT4, a Wnt ligand, and three targets of Wnt-ß-catenin transcription activation, namely, MMP7, cyclinD1 (CD1) and c-MYC in 141 penile tissue cores from 101 unique samples. The expression of all Wnt signaling proteins tested was increased by 1.6 to 3 fold in PeCa samples compared to control tissue (normal or cancer adjacent) samples (p<0.01). Expression of all proteins, except CD1, showed a significant decrease in grade II compared to grade I tumors. High magnification, deconvolved confocal images were used to measure differences in co-localization between the four proteins. Significant (p<0.04-0.0001) differences were observed for various permutations of the combinations of proteins and state of the tissue (control, tumor grades I and II). Wnt signaling may play an important role in PeCa and proteins of the Wnt signaling network could be useful targets for diagnosis and prognostic stratification of disease
    corecore